Example: reproducible data analysis

from Jake Vanderplas http://jakevdp.github.io/blog/2017/03/03/reproducible-data-analysis-in-jupyter/

This walks through a whole series, loading some data for analysis from the Fremont bridge bike data in Seattle, packaging some functionality into a python package (fremont) and doing some basic unit testing with the pytest package. To run the tests, run

'python -m pytest fremont'

from its parent directory. This will find all test functions and run them.



In [1]:

    
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn')



In [2]:

    
#this imports a small python package for downloading the fremont bike data 
#into a dataframe if it is not already downloaded

from fremont.data import get_fremont_data



In [3]:

    
data = get_fremont_data()
data.head()









    Out[3]:






  
    
      
      West
      East
      Total
    
    
      Date
      
      
      
    
  
  
    
      2012-10-03 00:00:00
      4.0
      9.0
      13.0
    
    
      2012-10-03 01:00:00
      4.0
      6.0
      10.0
    
    
      2012-10-03 02:00:00
      1.0
      1.0
      2.0
    
    
      2012-10-03 03:00:00
      2.0
      3.0
      5.0
    
    
      2012-10-03 04:00:00
      6.0
      1.0
      7.0



In [4]:

    
data.resample('W').sum().plot()









    Out[4]:





<matplotlib.axes._subplots.AxesSubplot at 0x247f3d0f5c0>



In [5]:

    
#resample to daily with rolling sum (Note TWO sum() functions)
ax=data.resample('D').sum().rolling(365).sum().plot()
ax.set_ylim(0, None)









    Out[5]:





(0, 1059460.05)



In [6]:

    
data.groupby(data.index.time).mean().plot()









    Out[6]:





<matplotlib.axes._subplots.AxesSubplot at 0x247f45df9e8>



In [7]:

    
pivoted = data.pivot_table('Total', index = data.index.time, columns = data.index.date)
pivoted.iloc[:5,:5]



In [8]:

    
pivoted.plot(legend=False, alpha=0.01)









    Out[8]:





<matplotlib.axes._subplots.AxesSubplot at 0x247f481d5f8>

	West	East	Total
Date
2012-10-03 00:00:00	4.0	9.0	13.0
2012-10-03 01:00:00	4.0	6.0	10.0
2012-10-03 02:00:00	1.0	1.0	2.0
2012-10-03 03:00:00	2.0	3.0	5.0
2012-10-03 04:00:00	6.0	1.0	7.0

	2012-10-03	2012-10-04	2012-10-05	2012-10-06	2012-10-07
00:00:00	13.0	18.0	11.0	15.0	11.0
01:00:00	10.0	3.0	8.0	15.0	17.0
02:00:00	2.0	9.0	7.0	9.0	3.0
03:00:00	5.0	3.0	4.0	3.0	6.0
04:00:00	7.0	8.0	9.0	5.0	3.0